Smoothness-Adaptive Contextual Bandits
نویسندگان
چکیده
We study a non-parametric multi-armed bandit problem with stochastic covariates, where key complexity driver is the smoothness of payoff functions respect to covariates. Previous studies have focused on deriving minimax-optimal algorithms in cases it priori known how smooth are. In practice, however, typically not advance, and misspecification may severely deteriorate performance existing methods. this work, we consider framework known, when adapt unknown smoothness. First, establish that designing is, general, impossible. However, under self-similarity condition (which does reduce minimax dynamic optimization at hand), adapting possible, further devise general policy for achieving smoothness-adaptive performance. Our infers payoffs throughout decision-making process, while leveraging structure off-the-shelf non-adaptive policies. settings either differentiable or non-differentiable functions, matches (up logarithmic scale) regret rate achievable priori.
منابع مشابه
Unimodal Bandits without Smoothness
We consider stochastic bandit problems with a continuum set of arms and where the expected re-ward is a continuous and unimodal function of the arm. No further assumption is made regarding thesmoothness and the structure of the expected reward function. We propose Stochastic Pentachotomy(SP), an algorithm for which we derive finite-time regret upper bounds. In particular, we sho...
متن کاملOptimal and Adaptive Off-policy Evaluation in Contextual Bandits
We study the off-policy evaluation problem— estimating the value of a target policy using data collected by another policy—under the contextual bandit model. We consider the general (agnostic) setting without access to a consistent model of rewards and establish a minimax lower bound on the mean squared error (MSE). The bound is matched up to constants by the inverse propensity scoring (IPS) an...
متن کاملKernalized Collaborative Contextual Bandits
We tackle the problem of recommending products in the online recommendation scenario, which occurs many times in real applications. The most famous and explored instances are news recommendations and advertisements. In this work we propose an extension to the state of the art Bandit models to not only take care of different users’ interactions, but also to go beyond the linearity assumption of ...
متن کاملContextual Dueling Bandits
We consider the problem of learning to choose actions using contextual information when provided with limited feedback in the form of relative pairwise comparisons. We study this problem in the dueling-bandits framework of Yue et al. (2009), which we extend to incorporate context. Roughly, the learner’s goal is to find the best policy, or way of behaving, in some space of policies, although “be...
متن کاملConservative Contextual Linear Bandits
Safety is a desirable property that can immensely increase the applicability of learning algorithms in real-world decision-making problems. It is much easier for a company to deploy an algorithm that is safe, i.e., guaranteed to perform at least as well as a baseline. In this paper, we study the issue of safety in contextual linear bandits that have application in many different fields includin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Social Science Research Network
سال: 2021
ISSN: ['1556-5068']
DOI: https://doi.org/10.2139/ssrn.3893198